Goto

Collaborating Authors

 medical diagnosis


The Biased Oracle: Assessing LLMs' Understandability and Empathy in Medical Diagnoses

Yao, Jianzhou, Liu, Shunchang, Drui, Guillaume, Pettersson, Rikard, Blasimme, Alessandro, Kijewski, Sara

arXiv.org Artificial Intelligence

Large language models (LLMs) show promise for supporting clinicians in diagnostic communication by generating explanations and guidance for patients. Yet their ability to produce outputs that are both understandable and empathetic remains uncertain. We evaluate two leading LLMs on medical diagnostic scenarios, assessing understandability using readability metrics as a proxy and empathy through LLM-as-a-Judge ratings compared to human evaluations. The results indicate that LLMs adapt explanations to socio-demographic variables and patient conditions. However, they also generate overly complex content and display biased affective empathy, leading to uneven accessibility and support. These patterns underscore the need for systematic calibration to ensure equitable patient communication. The code and data are released: https://github.com/Jeffateth/Biased_Oracle


SwasthLLM: a Unified Cross-Lingual, Multi-Task, and Meta-Learning Zero-Shot Framework for Medical Diagnosis Using Contrastive Representations

Sar, Ayan, Puri, Pranav Singh, Aich, Sumit, Choudhury, Tanupriya, Kumar, Abhijit

arXiv.org Artificial Intelligence

In multilingual healthcare environments, automatic disease diagnosis from clinical text remains a challenging task due to the scarcity of annotated medical data in low-resource languages and the linguistic variability across populations. This paper proposes SwasthLLM, a unified, zero-shot, cross-lingual, and multi-task learning framework for medical diagnosis that operates effectively across English, Hindi, and Bengali without requiring language-specific fine-tuning. At its core, SwasthLLM leverages the multilingual XLM-RoBERTa encoder augmented with a language-aware attention mechanism and a disease classification head, enabling the model to extract medically relevant information regardless of the language structure. To align semantic representations across languages, a Siamese contrastive learning module is introduced, ensuring that equivalent medical texts in different languages produce similar embeddings. Further, a translation consistency module and a contrastive projection head reinforce language-invariant representation learning. SwasthLLM is trained using a multi-task learning strategy, jointly optimizing disease classification, translation alignment, and contrastive learning objectives. Additionally, we employ Model-Agnostic Meta-Learning (MAML) to equip the model with rapid adaptation capabilities for unseen languages or tasks with minimal data. Our phased training pipeline emphasizes robust representation alignment before task-specific fine-tuning. Extensive evaluation shows that SwasthLLM achieves high diagnostic performance, with a test accuracy of 97.22% and an F1-score of 97.17% in supervised settings. Crucially, in zero-shot scenarios, it attains 92.78% accuracy on Hindi and 73.33% accuracy on Bengali medical text, demonstrating strong generalization in low-resource contexts.


Performance of Large Language Models in Supporting Medical Diagnosis and Treatment

Sousa, Diogo, Barbosa, Guilherme, Rocha, Catarina, Oliveira, Dulce

arXiv.org Artificial Intelligence

The integration of Large Language Models (LLMs) into healthcare holds significant potential to enhance diagnostic accuracy and support medical treatment planning. These AI-driven systems can analyze vast datasets, assisting clinicians in identifying diseases, recommending treatments, and predicting patient outcomes. This study evaluates the performance of a range of contemporary LLMs, including both open-source and closed-source models, on the 2024 Portuguese National Exam for medical specialty access (PNA), a standardized medical knowledge assessment. Our results highlight considerable variation in accuracy and cost-effectiveness, with several models demonstrating performance exceeding human benchmarks for medical students on this specific task. We identify leading models based on a combined score of accuracy and cost, discuss the implications of reasoning methodologies like Chain-of-Thought, and underscore the potential for LLMs to function as valuable complementary tools aiding medical professionals in complex clinical decision-making.


MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow

Wang, Ziyue, Wu, Junde, Low, Chang Han, Jin, Yueming

arXiv.org Artificial Intelligence

Developing reliable AI systems to assist human clinicians in multi-modal medical diagnosis has long been a key objective for researchers. Recently, Multi-modal Large Language Models (MLLMs) have gained significant attention and achieved success across various domains. With strong reasoning capabilities and the ability to perform diverse tasks based on user instructions, they hold great potential for enhancing medical diagnosis. However, directly applying MLLMs to the medical domain still presents challenges. They lack detailed perception of visual inputs, limiting their ability to perform quantitative image analysis, which is crucial for medical diagnostics. Additionally, MLLMs often exhibit hallucinations and inconsistencies in reasoning, whereas clinical diagnoses must adhere strictly to established criteria. To address these challenges, we propose MedAgent-Pro, an evidence-based reasoning agentic system designed to achieve reliable, explainable, and precise medical diagnoses. This is accomplished through a hierarchical workflow: at the task level, knowledge-based reasoning generate reliable diagnostic plans for specific diseases following retrieved clinical criteria. While at the case level, multiple tool agents process multi-modal inputs, analyze different indicators according to the plan, and provide a final diagnosis based on both quantitative and qualitative evidence. Comprehensive experiments on both 2D and 3D medical diagnosis tasks demonstrate the superiority and effectiveness of MedAgent-Pro, while case studies further highlight its reliability and interpretability. The code is available at https://github.com/jinlab-imvr/MedAgent-Pro.


An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating

Khaniki, Mohammad Ali Labbaf, Saadati, Sahabeh, Manthouri, Mohammad

arXiv.org Artificial Intelligence

This paper presents a novel Natural Language Processing (NLP) framework for enhancing medical diagnosis through the integration of advanced techniques in data augmentation, feature extraction, and classification. The proposed approach employs back-translation to generate diverse paraphrased datasets, improving robustness and mitigating overfitting in classification tasks. Leveraging Decoding-enhanced BERT with Disentangled Attention (DeBERTa) with Dynamic Contextual Positional Gating (DCPG), the model captures fine-grained contextual and positional relationships, dynamically adjusting the influence of positional information based on semantic context to produce high-quality text embeddings. For classification, an Attention-Based Feedforward Neural Network (ABFNN) is utilized, effectively focusing on the most relevant features to improve decision-making accuracy. Applied to the classification of symptoms, clinical notes, and other medical texts, this architecture demonstrates its ability to address the complexities of medical data. The combination of data augmentation, contextual embedding generation, and advanced classification mechanisms offers a robust and accurate diagnostic tool, with potential applications in automated medical diagnosis and clinical decision support. This method demonstrates the effectiveness of the proposed NLP framework for medical diagnosis, achieving remarkable results with an accuracy of 99.78%, recall of 99.72%, precision of 99.79%, and an F1-score of 99.75%. These metrics not only underscore the model's robust performance in classifying medical texts with exceptional precision and reliability but also highlight its superiority over existing methods, making it a highly promising tool for automated diagnostic systems.


Multimodal Marvels of Deep Learning in Medical Diagnosis: A Comprehensive Review of COVID-19 Detection

Islama, Md Shofiqul, Hasanc, Khondokar Fida, Shajeebd, Hasibul Hossain, Ranae, Humayan Kabir, Rahmand, Md Saifur, Hasanb, Md Munirul, Azadf, AKM, Abdullahg, Ibrahim, Moni, Mohammad Ali

arXiv.org Artificial Intelligence

This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.


Understanding AI outputs: study shows pro-western cultural bias in the way AI decisions are explained

AIHub

Anne Fehres and Luke Conroy & AI4Media / Better Images of AI / Data is a Mirror of Us / Licenced by CC-BY 4.0 Humans are increasingly using artificial intelligence (AI) to inform decisions about our lives. AI is, for instance, helping to make hiring choices and offer medical diagnoses. If you were affected, you might want an explanation of why an AI system produced the decision it did. Yet AI systems are often so computationally complex that not even their designers fully know how the decisions were produced. That's why the development of "explainable AI" (or XAI) is booming.


Self-Supervised Learning Featuring Small-Scale Image Dataset for Treatable Retinal Diseases Classification

Huang, Luffina C., Chiu, Darren J., Mehta, Manish

arXiv.org Artificial Intelligence

Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models and two TL models in treatable retinal diseases classification using small-scale Optical Coherence Tomography (OCT) images ranging from 125 to 4000 with balanced or imbalanced distribution for training. The proposed SSL model achieves the state-of-art accuracy of 98.84% using only 4,000 training images. Our results suggest the SSL models provide superior performance under both the balanced and imbalanced training scenarios. The SSL model with MoCo-v2 scheme has consistent good performance under the imbalanced scenario and, especially, surpasses the other models when the training set is less than 500 images.


Enhancing Medical Support in the Arabic Language Through Personalized ChatGPT Assistance

Issa, Mohamed, Abdelwahed, Ahmed

arXiv.org Artificial Intelligence

This Paper discusses the growing popularity of online medical diagnosis as an alternative to traditional doctor visits. It highlights the limitations of existing tools and emphasizes the advantages of using ChatGPT, which provides real-time, personalized medical diagnosis at no cost. The paragraph summarizes a research study that evaluated the performance of ChatGPT in Arabic medical diagnosis. The study involved compiling a dataset of disease information and generating multiple messages for each disease using different prompting techniques. ChatGPT's performance was assessed by measuring the similarity between its responses and the actual diseases. The results showed promising performance, with average scores of around 76% for similarity measures. Various prompting techniques were used, and chain prompting demonstrated a relative advantage. The study also recorded an average response time of 6.12 seconds for the ChatGPT API, which is considered acceptable but has room for improvement. While ChatGPT cannot replace human doctors entirely, the findings suggest its potential in emergency cases and addressing general medical inquiries. Overall, the study highlights ChatGPT's viability as a valuable tool in the medical field.


A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Zhao, Linxi, Tang, Jiankai, Chen, Dongyu, Liu, Xiaohong, Zhou, Yong, Wang, Guangyu, Wang, Yuntao

arXiv.org Artificial Intelligence

The introduction of machine learning marks a pivotal shift, presenting Nailfold capillaroscopy is a well-established method for automated medical image analysis as a promising alternative assessing health conditions, but the untapped potential of automated due to its higher accuracy compared to traditional image medical image analysis using machine learning remains processing algorithms[5]. Recent studies have attempted to despite recent advancements. In this groundbreaking use single deep-learning models for tasks such as nailfold study, we present a pioneering effort in constructing a comprehensive capillary segmentation[4, 8], measurement of capillary size dataset--321 images, 219 videos, 68 clinic reports, and density[5], and white cell counting[9]. Despite notable with expert annotations--that serves as a crucial resource achievements, the untapped potential of automated medical for training deep-learning models. Leveraging this image analysis persists due to the urgent need for annotated dataset, we propose an end-to-end nailfold capillary analysis and extensive datasets essential for effective training and pipeline capable of automatically detecting and measuring diverse fine-tuning deep neural networks.